Sequential delay analysis by placement engines

ABSTRACT

Some embodiments provide a method of designing an integrated circuit (IC). The design is expressed as a graph that includes several nodes that represent several IC components. The nodes include a first set of nodes that represent a set of clocked elements. The method creates a second set of nodes by removing all nodes in the first set from the nodes that represent the IC components. The method identifies a set of edges that connect two nodes in the second set without encompassing a third node in the second set. The method assigns an event time to each node in the second set. The method assigns a cost function based on the event times of the nodes connected by each edge and the number of nodes in the first set encompassed by each edge. The method optimizes the cost function and places the components based on the cost function optimization.

FIELD OF THE INVENTION

The present invention is directed towards placement engines forintegrated circuits.

BACKGROUND OF THE INVENTION

An IC is a device that includes numerous electronic components (e.g.,transistors, resistors, diodes, etc.) that are embedded typically on thesame substrate, such as a single piece of semiconductor wafer. Thesecomponents are connected with one or more layers of wiring to formmultiple circuits, such as Boolean gates, memory cells, arithmeticunits, controllers, decoders, etc. An IC is often packaged as a singleIC chip in one IC package, although some IC chip packages can includemultiple pieces of substrate or wafer.

Electronic Design Automation (EDA) tools are automated tools used in ICdesign. Placement and routing are steps in automatic design of ICs inwhich a layout of a larger block of the circuit or the whole circuit iscreated from layouts of smaller sub-blocks. During placement, thepositions of the sub-blocks in the design area are determined. Thesesub-blocks are interconnected during routing. A placer assigns exactlocations for circuit components within the IC chip's core area. Aplacer typically has several objectives such as minimizing total wirelength, timing optimization, reducing congestion, and minimizing power.The placer takes a given synthesized circuit netlist with a technologylibrary and produces a placement layout. The layout is optimizedaccording to a set of placer objectives.

The maximum delay through the critical path of a chip determines theclock cycle and, therefore, the speed of the chip. The timingoptimization is performed to ensure that no path exists with delayexceeding a maximum specified delay.

SUMMARY OF THE INVENTION

Some embodiments provide a method of designing an integrated circuit(IC). The design is expressed as a graph that includes several nodesthat represent several IC components. The nodes include a first set ofnodes that represent a set of clocked elements. The method creates asecond set of nodes by removing all nodes in the first set from thenodes that represent the IC components. The method identifies a set ofedges that connect two nodes in the second set without encompassing athird node in the second set. The method assigns an event time to eachnode in the second set. The method assigns a cost function based on theevent times of the nodes connected by each edge and the number of nodesin the first set encompassed by each edge. The method optimizes the costfunction and places the components based on the cost functionoptimization.

In some embodiments, the cost function is further based on horizontaland vertical coordinates of the nodes on the graph. In some embodiments,the cost function is optimized by changing at least one of the eventtime and a coordinate of a node. In some embodiments, all clockedelements in the first set of nodes are retimable clocked elements. Insome embodiments, the nodes in the second set include clocked elementsthat cannot be retimed. In some embodiments, the nodes in the second setinclude input and out nodes of the graph. In some embodiments, the nodesin the second set include any nodes with timing constraints. In someembodiments, the nodes in the second set include storage elements. Insome embodiments, the IC is either an application-specific integratedcircuit (ASIC), structured ASIC, field-programmable gate arrays (FPGA),programmable logic devices (PLD), complex programmable logic devices(CPLD), or system on chip (SOC), or system-in-package (SIP).

In some embodiments, the IC is a reconfigurable IC that includes atleast one reconfigurable circuit that reconfigures during an operationof the IC. In some embodiments, at least one reconfigurable circuit canreconfigure at a first clock rate that is faster than a second clockrate which is specified for a particular design of the IC. In someembodiments, the second clock has a clock cycle that includes severalsub-cycles. In these embodiments, placing the IC components includesassigning each node in the second set of nodes to a particular sub-cycleof the second clock.

Some embodiments provide a method of designing an integrated circuit(IC). The method optimizes a cost function that includes at least onetime variable. The method places the IC components based on the costfunction optimization. The placing is performed only once afteroptimizing the cost function. In some embodiments, the time variableincludes several event times that are assigned to the components in theIC design. In some embodiments, the cost function further includeshorizontal and vertical coordinates of each component. The cost functionis optimized by changing at least one of the event time and a coordinateof a component.

Some embodiments provide a method of designing an integrated circuit(IC). The IC design is expressed as a graph that includes several edgesand several nodes that represent several IC components. Each edgeconnects two nodes without encompassing a third node. The method assignsan event time to each node in the graph. The method assigns a costfunction for each edge based on the event times of the nodes connectedby each edge. The method optimizes the cost function and places the ICcomponents based on the optimized cost function. In some embodiments,the cost function is further based on horizontal and verticalcoordinates of the nodes on the graph. In some embodiments, the costfunction is optimized by changing at least one of the event time and acoordinate of a node. In some embodiments, the IC is either anapplication-specific integrated circuit (ASIC), structured ASIC,field-programmable gate arrays (FPGA), programmable logic devices (PLD),complex programmable logic devices (CPLD), system on chip (SOC),system-in-package (SIP), or reconfigurable IC.

BRIEF DESCRIPTION OF THE DRAWINGS

The novel features of the invention are set forth in the appendedclaims. However, for purpose of explanation, several embodiments of theinvention are set forth in the following figures.

FIG. 1 illustrates a path that includes computational and clockedelements between a source node and a target node in some embodiments.

FIG. 2 conceptually illustrates an example of a sub-cycle reconfigurableIC.

FIG. 3 illustrates an example of combinational delay computation in someembodiments.

FIG. 4 conceptually illustrates a processed to compute accumulateddelays in some embodiments.

FIG. 5 conceptually illustrates a process that utilizes combinationaldelay computation to determine whether a path has met its timingrequirements in some embodiments.

FIG. 6 illustrates a path in which clocked elements are conceptuallyreplaced with non-computational elements with negative delays.

FIG. 7 illustrates an example of sequential delay computation in someembodiments.

FIG. 8 conceptually illustrates a process that utilizes sequential delaycomputation to determine whether a path has met its timing requirementsin some embodiments.

FIG. 9 illustrates retiming clocked elements across a path in someembodiments to meet timing in all timed paths.

FIG. 10 conceptually illustrates a process performed by a placementengine to do timing analysis using sequential delay computations.

FIG. 11 illustrates an example of problems with calculation ofsequential delay in a circuit with a failing loop.

FIG. 12 illustrates a table that shows the results of several iterationsof sequential delay calculations for FIG. 11.

FIG. 13 illustrates a path with a source and a target node in someembodiments.

FIG. 14 conceptually illustrates a process performed by a placementengine to do timing analysis in some embodiments.

FIG. 15 illustrates a process that conceptually shows timing analysisperformed by a placement engine in some embodiment using sequentialdelay calculation.

FIG. 16 illustrates conceptual removal of clocked elements fromsequential edges in some embodiments to facilitate sequential delaycalculation.

FIG. 17 conceptually illustrates a process that identifies a sub-cyclefor performing each computational element in a netlist in someembodiments.

FIG. 18 illustrates a path with event times assigned to computationalelements of a reconfigurable IC in some embodiments.

FIG. 19 illustrates the path of FIG. 18 after certain computationalelements are moved to other operational cycles.

FIG. 20 illustrates a timeline for performing computational elements ofFIG. 19.

FIG. 21 illustrates a timeline for performing computational elements ofFIG. 19.

FIG. 22 illustrates an electronics system with which some embodiments ofthe invention are implemented.

DETAILED DESCRIPTION OF THE INVENTION

In the following description, numerous details are set forth for purposeof explanation. However, one of ordinary skill in the art will realizethat the invention may be practiced without the use of these specificdetails. For instance, not all embodiments of the invention need to bepracticed with the specific devices referred to below. In otherinstances, well-known structures and devices are shown in block diagramform in order not to obscure the description of the invention withunnecessary detail.

I. Overview

Some embodiments provide a method of designing an integrated circuit(IC). The design is expressed as a graph that includes several nodesthat represent several IC components. The nodes include a first set ofnodes that represent a set of clocked elements. The method creates asecond set of nodes by removing all nodes in the first set from thenodes that represent the IC components. The method identifies a set ofedges that connect two nodes in the second set without encompassing athird node in the second set. The method assigns an event time to eachnode in the second set. The method assigns a cost function based on theevent times of the nodes connected by each edge and the number of nodesin the first set encompassed by each edge. The method optimizes the costfunction and places the components based on the cost functionoptimization.

In some embodiments, the cost function is further based on horizontaland vertical coordinates of the nodes on the graph. In some embodiments,the cost function is optimized by changing at least one of the eventtime and a coordinate of a node. In some embodiments, all clockedelements in the first set of nodes are retimable clocked elements. Insome embodiments, the nodes in the second set include clocked elementsthat cannot be retimed. In some embodiments, the nodes in the second setinclude input and out nodes of the graph. In some embodiments, the nodesin the second set include any nodes with timing constraints. In someembodiments, the nodes in the second set include storage elements. Insome embodiments, the IC is either an application-specific integratedcircuit (ASIC), structured ASIC, field-programmable gate arrays (FPGA),programmable logic devices (PLD), complex programmable logic devices(CPLD), or system on chip (SOC), or system-in-package (SIP).

In some embodiments, the IC is a reconfigurable IC that includes atleast one reconfigurable circuit that reconfigures during an operationof the IC. In some embodiments, at least one reconfigurable circuit canreconfigure at a first clock rate that is faster than a second clockrate which is specified for a particular design of the IC. In someembodiments, the second clock has a clock cycle that includes severalsub-cycles. In these embodiments, placing the IC components includesassigning each node in the second set of nodes to a particular sub-cycleof the second clock.

Some embodiments provide a method of designing an integrated circuit(IC). The method optimizes a cost function that includes at least onetime variable. The method places the IC components based on the costfunction optimization. The placing is performed only once afteroptimizing the cost function. In some embodiments, the time variableincludes several event times that are assigned to the components in theIC design. In some embodiments, the cost function further includeshorizontal and vertical coordinates of each component. The cost functionis optimized by changing at least one of the event time and a coordinateof a component.

Some embodiments provide a method of designing an integrated circuit(IC). The IC design is expressed as a graph that includes several edgesand several nodes that represent several IC components. Each edgeconnects two nodes without encompassing a third node. The method assignsan event time to each node in the graph. The method assigns a costfunction for each edge based on the event times of the nodes connectedby each edge. The method optimizes the cost function and places the ICcomponents based on the optimized cost function. In some embodiments,the cost function is further based on horizontal and verticalcoordinates of the nodes on the graph. In some embodiments, the costfunction is optimized by changing at least one of the event time and acoordinate of a node. In some embodiments, the IC is either anapplication-specific integrated circuit (ASIC), structured ASIC,field-programmable gate arrays (FPGA), programmable logic devices (PLD),complex programmable logic devices (CPLD), system on chip (SOC),system-in-package (SIP), or reconfigurable IC.

Several more detailed embodiments of the invention are described insections below. Before describing these embodiments further, severalterms and concepts used in the disclosure is described below in SectionII. This discussion is followed by the discussion in Section III ofcombinational and sequential delays. Next, Section IV describes severalembodiments of placement engines that also perform timing analysis.Last, Section V describes an electronics system with which some of theembodiments of the invention are implemented.

II. Terms and Concepts

A. Graph Representation of IC Designs

A netlist is a graph representation of an IC design. The graph isrepresented by a collection of node and edges. The nodes representcomponents of the IC and the edges represent connections between thesecomponents. The edges connect the nodes but do not go through (i.e., donot encompass) any nodes. In an IC design, each component lies on one ormore signal paths (“paths”). A path is a sequence of nodes and edges ina netlist. The starting node is referred to as the source node and theend node is referred to as the sink or target node. The source andtarget nodes are also referred to as endpoints of a path. The source andtarget designations of the endpoints are based on the direction of thesignal flow through the path.

A timed path is a path whose both endpoints are timed elements. Timedelements include primary inputs (through which the circuit receivesexternal input), primary outputs (through which the circuit sendsoutputs to external circuits), clocked elements, storage elements, orany node with timing constraints (e.g., a node with a fixed time, eitherbecause the node cannot be retimed or the node is specified as when itshould occur).

FIG. 1 illustrates a timed path 100 between a source node 105 and atarget node 110 in some embodiments. The path 100 includes sixcomputational elements 115 (shown as circles), three clocked elements120 (shown as rectangles), and ten edges 125. Some embodiments utilizeregisters or latches as clocked elements. FIG. 1 also shows othersmaller timed paths. For instances the path starting from the sourcenode 105 and ending to clocked element 120 is also a timed path. Inorder for an IC design to meet timing requirements, total delay(including computation delays and signal propagation delays) on eachtimed path must be less than or equal to one clock period.

The arrival time of a signal is the time elapsed for a signal to arriveat a certain point. The reference, or time 0, is taken from a sourcenode. In some embodiments, when the source node is a primary input, thereference time is taken as the arrival time of a signal received at theprimary input. Also, when the source node is a clocked element, thereference time is taken as the time a clock signal is received at theclocked element.

To calculate the arrival time of a signal at a node, delay calculationfor all components and edges on the path are required. The required timeis the latest time at which a signal can arrive without making the clockcycle longer than desired.

The time difference between the arrival time of a signal and therequired arrival time of the signal is referred to as slack. The slackfor a node is expressed by the following equation (A):

slack=required arrival time−arrival time  (A)

A positive (or zero) slack at a node indicates that the node has met itstiming requirements. A positive slack also implies that the arrival timeof the signal at that node may be increased by the value of the slackwithout affecting the overall delay of the circuit. Conversely, anegative slack implies that a path is too slow. Therefore, the path mustsped up (or the reference signal delayed) if the whole circuit is towork at the desired speed. A critical path is defined as a timed pathwith largest negative slack.

B. Configurable IC's

A configurable IC is a circuit that can “configurably” perform a set ofoperations. Specifically, a configurable circuit receives “configurationdata” that specifies the operation that the configurable circuit has toperform from the set of operations that it can perform. In someembodiments, the configuration data is generated outside of theconfigurable IC. In these embodiments, a set of software tools convertsa high-level IC design description (e.g., a circuit representation or ahardware description language design) into a set of configuration datathat can configure the configurable IC (or more accurately, theconfigurable circuits of the configurable IC) to implement the ICdesign.

C. Reconfigurable IC's

A reconfigurable IC is a configurable IC that has at least one circuitthat reconfigures during runtime. In other words, a reconfigurable IC isan IC that has reconfigurable logic circuits and/or reconfigurableinterconnect circuits, where the reconfigurable logic and/orinterconnect circuits are configurable logic and/or interconnectcircuits that can “reconfigure” more than once at runtime. Aconfigurable logic or interconnect circuit reconfigures when it receivesa different set of configuration data. Some embodiments of the inventionare implemented in reconfigurable ICs that are sub-cycle reconfigurable(i.e., can reconfigure circuits on a sub-cycle basis). In someembodiments, a reconfigurable IC has a large number of logic andinterconnect circuits (e.g., hundreds, thousands, etc. of suchcircuits). Some or all of these circuits can be reconfigurable.

In some embodiments, runtime reconfigurability means reconfiguringwithout resetting the reconfigurable IC. Resetting a reconfigurable ICentails in some cases resetting the values stored in the state elementsof the IC, where state elements are elements like latches, registers,and non-configuration memories (e.g., memories that store the usersignals as opposed to the memories that store the configuration data ofthe configurable circuits). In some embodiments, runtimereconfigurability means reconfiguring after the reconfigurable IC hasstarted processing of the user data. Also, in some embodiments, runtimereconfigurability means reconfiguring after the reconfigurable IC haspowered up. These definitions of runtime reconfigurability are notmutually exclusive. Examples of configurable and reconfigurable ICs aredescribed in detail in U.S. patent application Ser. No. 11/081,859,“Configurable IC with Interconnect Circuits that also Perform StorageOperations”, filed on Mar. 15, 2005.

D. Sub-Cycle Reconfigurable IC

FIG. 2 conceptually illustrates an example of a sub-cycle reconfigurableIC. Specifically, in its top left hand corner, this figure illustratesan IC design 205 that operates at a clock speed of X MHz. Typically, anIC design is initially specified in a hardware description language(HDL), and a synthesis operation is used to convert this HDLrepresentation into a circuit representation. After the synthesisoperation, the IC design includes numerous electronic circuits, whichare referred to below as “components.”

As further illustrated in FIG. 2, the operations performed by thecomponents in the IC design 205 can be partitioned into four sets ofoperations 210-225, with each set of operations being performed at aclock speed of X MHz. FIG. 2 then illustrates that these four sets ofoperations 210-225 can be performed by one sub-cycle reconfigurable IC230 that operates at 4× MHz. In some embodiments, four cycles of the 4×MHz clock correspond to four sub-cycles within a cycle of the X MHzclock. Accordingly, this figure illustrates the reconfigurable IC 230reconfiguring four times during four cycles of the 4× MHz clock (i.e.,during four sub-cycles of the X MHz clock). During each of thesereconfigurations (i.e., during each sub-cycle), the reconfigurable IC230 performs one of the identified four sets of operations. In otherwords, the faster operational speed of the reconfigurable IC 230 allowsthis IC to reconfigure four times during each cycle of the X MHz clock,in order to perform the four sets of operations sequentially at a 4× MHzrate instead of performing the four sets of operations in parallel at anX MHz rate.

III. Combinational and Sequential Delays

A. Combinational Delay

Combinational delay computation is performed on a path that starts froma clocked element source node and ends to a clocked element target nodewithout encompassing any other clocked elements. Alternatively, the pathcan either start with any timed element and end to a clocked element orstart with a clocked element and end to a timed element withoutencompassing any other clocked elements. The delay starts at zero and isaccumulated as the path is traversed in the signal direction from asource node to a target node.

FIG. 3 illustrates a path 300 in some embodiments. As shown, path 300starts from a primary input node 305 and ends to a primary output node310. The path includes six computational elements (shown as circles) andthree clocked elements (shown as rectangles). For this example, theclock period is assumed to be four time units. Also, for simplicity itis assumed that each computational element takes two time units toperform its operation.

Furthermore, for simplicity, it is assumed that there are no delaysattributed to wiring lengths in this example. Alternatively, the delaysattributed to wiring lengths between two endpoints can be added to thedelay of the target node. As shown in FIG. 3, path 300 includes foursmaller paths that either (1) start from a clocked element source nodeand end to a clocked element target node, (2) start from a timed elementsource node and end to a clocked element target node, or (3) start froma clocked element source node and end to a timed element target node.None of these paths encompass any other clocked elements. These fourpaths are the paths between (1) timed element 305 and clocked element315, (2) clocked element 315 and clocked element 320, (3) clockedelement 320 and clocked element 325, and (4) clocked element 325 andtimed element 310. The combinational delay for each of these paths isthe accumulated delays between the source and target nodes and iscalculated by adding all computation and propagation delays between thesource and target nodes.

Calculation of accumulated delays for each path is described byreference to FIGS. 4 and 5. FIG. 4 conceptually illustrates a process400 for calculating accumulated delays for a path that starts with asource node and ends with a target node. As shown in FIG. 4, the processidentifies (at 405) the source and target nodes of the path. Next, theprocess sets (at 410) the accumulated delay at the source node to zero.

Next, the process accumulates delays from the source node to the targetnode by adding the delays for each computational element. If the delayscaused by interconnect wire lengths are not negligible, the process alsoadds (at 415) these delays to the accumulated delays. Also, when a nodehas more than one input, the delay of the input path with maximum delayis considered in computation of the sequential delay.

FIG. 5 conceptually illustrates a process 500 that utilizescombinational delay computation to determine whether a path has met itstiming requirements in some embodiments. As shown, the processdetermines (at 505) whether the accumulated delay is more than therequired time for the signal to get from the source node to the targetnode. In the example of FIG. 3, the required time for each of the fouridentified paths is one clock period (or four time units). When theaccumulated delay is more than the required time, the process determines(at 510) that the path does not meet its timing requirements. On theother hand, when the delay is less than or equal to the required time,the process determines (at 515) that the path meets its timingrequirements.

Utilizing process 400, the combinational delays for the elements of thefour paths identified on FIG. 3 are computed and the results aredisplayed on top of each element. Utilizing process 500, it isdetermined whether each path meets its timing requirements. As shown,when the combinational delay computations are compared with the requiredtimes for each target node, the timing requirements for the first targetnode (315) fails and the timing requirement for the other three targetnodes (320, 325, and 310) pass. Since the path between 305 and 315 failsthe timing requirement, the overall path 300, which includes the failedpath, also fails the timing requirements.

B. Sequential Delay

Sequential delay computation is similar to combinational delaycomputation, except sequential delay computation accounts for paths thatcan go through clocked elements. FIG. 6 illustrates a path 600 in someembodiments. This path is similar to path 300. As shown in FIG. 6, theclocked elements 605 are conceptually replaced by non-computationalelements 610 with a negative delay equal to one clock period whichaccounts for the fact that the required time between two adjacentclocked elements (or a clocked element and an adjacent primary input oroutput) is one clock period. In FIG. 6, the delays associated with eachnode are shown under each node. In some embodiments, a path can alsostart from or end to a clocked element that cannot be retimed. In theseembodiments, the outputs of the clocked elements that cannot be retimedare considered to occur at time zero.

FIG. 7 illustrates the results of sequential delay computation for path600 shown in FIG. 6. Process 400, shown in FIG. 4 (which was discussedin reference to computing combinational delays) is also utilized tocompute sequential delays. For the example of FIG. 7, process 400identifies (at 400) node 705 as the source node and node 710 as thetarget node. The process does not identify any of the clocked elementsas source or target nodes. In other words, the delays are not reset tozero after each clocked element. Instead a delay equal to one clockperiod is subtracted from the accumulated delay to account for eachclocked element. Specifically, process 400 sets (at 410) the accumulateddelay for the source node 705 to 0.

Next, the delays are accumulated (at 415) through the clocked elements.Since clocked elements are assigned negative delays, the effect of eachclocked element is subtraction of one clock period from the accumulateddelay. The delay is accumulated until the target node 710 is reached.The results of these computations for each node are shown on top of thenodes in FIG. 7.

FIG. 8 illustrates a process 800 that utilizes sequential delaycomputation to determine whether a path meets its timing requirements.As shown, the process determines (at 805) whether the accumulated delayof the target node is more than one clock period. When the accumulateddelay of the target node is more than one clock period, the processdetermines (at 810) that the path from the source node to the targetnode cannot meet its timing requirements with the given clock period. Onthe other hand, when the accumulated delay of the target node is lessthan or equal to one clock period, the process determines (at 815) thatthe path can meet its timing requirements.

For instance, in FIG. 7, the accumulated delay of the target node 710 iszero. This accumulated delay is the sequential delay of path 700 thatstarts from the source node 705 and ends to the target node 710. Thesequential delay being less than one clock period indicates that thereexists a retiming of the clocked elements such that all elements meettheir required timing. In the Example of FIG. 7, since the accumulateddelay for the target node is zero and the clock period is 4 time units,the path 700 can be retimed by moving the clocked elements across thepath until the path meets its timing requirement.

C. Retiming

FIG. 9 illustrates the retiming of path 700 shown in FIG. 7 to make thepath meet its timing requirements. In this example, the timingrequirements are met when every path that either (1) starts from aclocked element and ends to a clocked element, (2) starts from a timedelement and ends to a clocked element, or (3) starts from a clockedelement ends to a timed element, without encompassing any other clockedelements meets its timing requirements (i.e., the arrival time of asignal at a target node is less than or equal to its required time).

As shown in FIG. 9, clocked element 725 which was originally betweencomputational elements 725 and 730 is retimed to be betweencomputational elements 715 and 720. In effect this clocked element ismoved to an earlier point in time. As a result, the path between thesource node 705 and clocked element 725 will have one computationalelement. Similarly, clocked element 740 that was originally betweencomputational elements 735 and 745 is moved between computationalelements 725 and 730. As a result the path between clocked elements 725and 740 includes two computational elements. Further, clocked element750 that was between computational element 745 and the target node 710is moved between computational elements 730 and 735. As a result, thepath between clocked element 750 and the target node 710 includes twocomputational elements.

Utilizing process 400, the combinational delays for the elements of thefour paths (705 to 725, 725 to 740, 740 to 750, and 750 to 710) arecomputed and the results are displayed on top of each element. Thesefour paths are the paths between two clocked elements or a clockedelement and a non-clock timed element. None of the paths encompassesanother clocked element other than the source and/or the target nodes.

Utilizing process 500, combinational delays are compared with requiredtimes for the signals to get from source to target nodes in each path.As shown in FIG. 9, when the combinational delay computations arecompared with the required times for each target node, the timingrequirements for all target nodes (725, 740, 750, and 710) are met.Since every path between two clocked elements or between one clockedelement and the source or target nodes meets its timing requirement, theoverall path 700 also passes the timing requirements (i.e., the path canbe performed using the current clock period).

IV. Placement Engines that Perform Timing Analysis

In some embodiments, the timing analysis is performed by the placementengine while the placement engine is optimizing other costs (such aswiring lengths and congestion) of the netlist. However, calculatingsequential delays can be very time consuming. FIG. 10 illustrates aprocess 1000 that computes sequential delays and performs timinganalysis for the netlist. As shown, the process identifies (at 1005) acost function to optimize during placement operation. The cost functionincludes a time variable to optimize. The cost function also includesone or more other variables such as wiring length and congestion tooptimize.

Next, the process optimizes (at 1010) the cost function by changing oneor more of the variables. Next, the process performs (at 1015)sequential delay timing analysis for each path from a source node to atarget node in the netlist. These paths can go through clocked elements.The process then determines (at 1020) whether the target nodes in eachpath meet their timing requirements. When at least one target does notmeet the timing requirements, the process proceeds to 1040 which isdescribed below.

Otherwise, the process determines (at 1025) whether a shorter clockperiod can be examined to further improve the clock period. The processmay utilize a binary search to find shorter values for clock perioduntil a clock period acceptable by the circuit design is reached or theclock period cannot be improved any further. When the process determinesthat the clock period cannot be improved any further, the processproceeds to 1065 which is described below. Otherwise, the process savesthe current clock period as a clock period that has met the timingrequirements. Next, the process decreases (at 1035) the clock period andproceeds to 1010 which was described above.

When the test at 1020 fails, the process determines (at 1040) whetherall target nodes had met their timing requirements in a previousiteration. If not, the process increases (at 1040) the clock period andproceeds to 1010 that was described above. The process may utilize abinary search to find the next value for the clock period.

After 1040, when the process determines that all target nodes have metthe timing requirements, the process determines (at 1050) whether adifferent clock period can be examined to further improve the clockperiod. Although the current clock period satisfies the timingrequirements of all target nodes, some circuit designs may set a goal offurther improving the clock period until a certain number of iterationsare performed, the clock period becomes smaller than a certain value,the improvement in the clock period becomes negligible after certainnumber of iterations, or other criteria is met.

When (after 1050) the process determines that the clock period can befurther improved, a new clock period which is longer than the currentclock period but shorter than the previously acceptable period isselected (at 1055). The process then proceeds to 1010 that was describedabove.

On the other hand, when (after 1045) the process determines that theclock period cannot be improved any further, the process restores (at1060) the best value of the clock period that met the timingrequirements in a previous iteration. Finally, the process analyzes (at1065) each path in the netlist and retimes the clocked elements betweencomputational elements to make the delays between each two adjacentclocked elements or between each timed element and its adjacent clockedelements less than or equal to a clock period. As shown in FIG. 10, theprocess has to repeatedly calculate sequential timing analysis for eachpath in the netlist to determine whether the timing requirements are metor whether a shorter clock period can be identified.

A. Failing Loops

Sequential delay computation is an expensive computation. When a netlistincludes a loop and the clock period is relatively small, the sequentialdelay computation for the given clock period may not converge. FIG. 11illustrates an example of a netlist 1100 with such a failing loop. Asshown, computational element 1115 receives two inputs. One input fromcomputational element 1110 and another input from clocked element 1145.The sequential delay of node 1115 is the maximum sequential delay of itsinputs plus the computational delay of computational element 1115. Inthe example of FIG. 11, the clock period is assumed to be four timeunits and all computational elements are assumed to have a delay of twotime units.

Utilizing process 400, an initial value for the sequential delays of thenodes in netlist 1100 is computed. FIG. 12 illustrates a Table 1200 thatshows that results of sequential delay computations after severaliterations. Initially, the sequential delay of node 1145 is not known.Therefore, the sequential delay for node 1115 is computed as the totalof two time units for node 1110 delay and two time units for node 1115delay. The results of the initial computation for sequential delays ofthe nodes in netlist 1100 are shown in Table 1200. As shown, after theinitial iteration, the sequential delay of node 1145 is computed to befour time units. The sequential delay of node 1115 can now be updated tobe four time units (which is the maximum delay of its input paths) plustwo time units (which is the delay attributed to the node itself and itsassociated wiring).

After all sequential delays are updated node 1145 will have a sequentialdelay of six time units as shown for the second iteration in Table 1200.This new value of delay for 1145 results in an updated value of eighttime units for the sequential delay of node 1115 in the third iteration.As shown in Table 1200, the sequential delay values do not converge forthe given clock period. The value of the clock period has to beincreased in order for the sequential delay values to converge. In acomplicated netlist in which loops are not easily detectable, thesequential delay computation will be very time consuming and will take along time to find an appropriate value of clock period for which allsequential delays converge.

B. Timing Driven Placement Engines that Assign Event Times to Nodes inthe Netlist

Typically, placement engines model the netlist by assigning a horizontaland a vertical coordinate (x and y location) to each node in the graph.In some embodiments, a new dimension is added to the placement engine byassigning an event time to each node in the netlist. In theseembodiments, the placement engine performs a three dimensionalplacement.

FIG. 13 illustrates an edge 1300 between a source node 1305 and a targetnode 1310. In FIG. 13, in addition to the horizontal and verticalcoordinates of a node, the placement engine has assigned an event timeto each node. In some embodiments, the event time of a node is thearrival time of the signal to the node. In some embodiments, the eventtime of a node is the time a stable output signal is available at theoutput of the node. As shown, the source node 1305 is represented bythree space-time values (x₀, y₀, t₀) and the target node 1310 isrepresented by three space-time values (x₁, y₁, t₁). The placementengine will optimize the functions of Δx, Δy, and Δt.

FIG. 14 conceptually illustrates a process 1400 utilized by a placementengine of some embodiments to perform timing analysis using assignedevent times. As shown, the process assigns (at 1410) an event time tothe source and target nodes of each edge in the netlist. As describedabove, each node will be represented by three variables x, y, and trepresenting horizontal coordinate, vertical coordinate, and timerespectively. Next, the process defines a delay function that producesestimated interconnect delays. In some embodiments, the delay functionfor an edge is represented as d(Δx, Δy), where Δx and Δy are thedifferences between the horizontal and vertical coordinates of thetarget and source nodes of the edge as shown in FIG. 13.

Next, the process defines (at 1420) a cost function for the netlist. Insome embodiments, the cost function is a function of the interconnectdelays and the event times of the source and target nodes. In theseembodiments, the cost function is expressed by the following equation(B):

Σ_(edges,i) fn(Δt _(i) ,d(Δx _(i) ,Δy _(i)))  (B)

where for each edge, i, Δt_(i) is the difference between the event timesof the target and the source nodes; Δx_(i) is the difference between thex coordinates of the target and the source nodes; and Δy_(i) is thedifference between the y coordinates of the target and the source nodes.

Finally, the process optimizes (at 1425) the cost function based ongiven criteria for clock period, interconnect wiring length, congestion,etc. The process places (at 1430) the IC components after timingrequirements are met. The placement meets timing requirements when:

Σ_(edges,i)(Δt _(i) ≧d(Δx _(i) ,Δy _(i)))  (C)

where for each edge, i, Δt_(i) is the difference between the event timesof the target and the source nodes and d(Δx_(i),Δy_(i)) is the delayfunction for edge i.

Since the cost function in equation (B) is based on edges of the graph,when the placement engine changes the event time of a particular node tooptimize the cost function, only the time difference, Δt, for the edgesthat start or end on that particular node are affected. The placementengine does not have to recalculate the delays throughout the netlist.

C. Timing Driven Placement Engines that Compute Sequential Delays

In some embodiments, the placement engine performs sequential delaycomputation as a part of its timing analysis. FIG. 15 conceptuallyillustrates a process 1500 utilized by placement engine of someembodiments for performing sequential delays computation for a netlist.As shown, the process identifies (at 1505) source nodes and target nodesin the netlist. The process selects computational elements and timedelements as source and target nodes. In some embodiments, the processalso identifies clocked elements that cannot be retimed as the sourceand target nodes. However, the process does not consider retimableclocked elements as source or target nodes. The process then identifies“sequential edges” which are paths that go from a source, through theseunconsidered clocked elements, ending at a target. The sequential edgesdo not include any other intervening source or target nodes.

Next, the process assigns (at 1510) an event time to each source andtarget node which is on a sequential edge. As described above, in someembodiments each node is represented by three variables x, y, and trepresenting horizontal coordinate, vertical coordinate, and timerespectively. In some embodiments, the event times are absolute valuesgiven from a time when the execution of the netlist will start duringruntime.

Next, the process counts (at 1515) the number of clocked elementslocated on each sequential edge. FIG. 16 illustrates a technique thatsome embodiments employ to count the number of clocked elements on eachsequential edge. As shown, path 1600 starts from a source node 1605 andends at a target node 1610. Path 1600 has four computational elements1615-1630 and three clocked elements 1635-1645. In this example, thesource and target nodes are a primary input 1605 and a primary output1610 respectively. As shown, path 1600 includes several smaller paths.Each one of these paths starts from a timed element or a computationalelement as the source node and ends to the next immediate timed elementor computational element as the target node. For instance, the path from1605 to 1615 is between a primary input and a computational element andthe path from 1625 to 1640 is between a computational element and aclocked element.

In some embodiments, a placement engine conceptually transforms a pathsuch as 1600 to a path such as 1650 in which the clocked elements arenot considered as source or target nodes of the smaller paths. Instead,in path 1650, the smaller paths are sequential edges that start fromeither timed elements (other than retimable clocked element) orcomputational elements as source nodes and end to the next timed element(other than a retimable clocked element) or computational element. Inother words, the sequential edges are allowed to go through theretimable clocked elements. The number of clocked elements on eachsequential edge is counted and is used in computation of sequentialdelay as indicated further below. For example, the sequential edgebetween computational elements 1620 and 1625 goes through one clockedelement while the sequential edge between computational elements 1625and 1630 goes through two clocked elements. The number of clockedelements (when more than zero) are shown on top of each node in FIG. 16.

Referring back to FIG. 15, the process defines (at 1520) a delayfunction that produces estimated interconnect delays. In someembodiments, the delay function for a sequential edge is represented asd(Δx, Δy), where Δx and Δy are the difference between the horizontal andvertical coordinates of the target and source nodes of the sequentialedge.

Next, the process defines (at 1525) a cost function for all sequentialedges in the netlist. In some embodiments, the cost function for asequential edge is a function of (1) the interconnect delay, d(Δx, Δy),of the sequential edge and (2) the difference between the event times ofthe source and target nodes of the sequential edge. In theseembodiments, the cost function is expressed by the following equation(D):

Σ_(sequential edges,i) fn(Δt _(i) ,d(Δx _(i) ,Δy _(i)))  (D)

where Δx_(i) is the difference between the x coordinates of the targetand the source nodes, Δy_(i) is the difference between the y coordinatesof the target and the source nodes, and Δt_(i) is the difference betweenthe event times of the target and the source nodes.

However, when there are retimable clocked elements on a sequential edge,the difference between the event times of the target node and the sourcenode of the edge is increased by the number of clocked elements on theedge multiplied by the clock period. The cost function is, therefore,expressed by the following equation (E):

Σ_(sequential edges,i) fn((Δt _(i)+(# of clock elements on thesequential edges*clock period)),d(Δx _(i) ,Δy _(i)))  (E)

Next, the process optimizes (at 1530) the cost function based on givencriteria for clock period, interconnect wiring length, congestion, andother optimization criteria. The process analyzes (at 1535) the netlistand moves the clocked elements (if necessary) between the computationalelements to make the delays between each two adjacent clocked elementsless than or equal to a clock period. Finally, the process places the ICcomponents.

D. Placement Engines for Reconfigurable ICs

The embodiments disclosed in previous sections are applicable to anykind of ICs such as application-specific integrated circuits (ASICs),structured ASICs, field-programmable gate arrays (FPGAs), programmablelogic devices (PLDs), complex programmable logic devices (CPLDs), systemon chips (SOCs), system-in-packages (SIPs), reconfigurable ICs (e.g.,space-time machines), etc. The embodiments disclosed in this section areapplicable to reconfigurable ICs and the reconfigurable portions ofSOCs, SIPs, etc. As described below, some embodiments are implemented toperform placement for reconfigurable ICs. In some embodiments with asub-cycle reconfigurable IC, the placement engine determines whichsub-cycle each computational element falls into.

FIG. 17 conceptually illustrates a process 1700 that determines whichsub-cycle a computational element falls into. As shown, the processretimes (at 1705) the netlist by retiming clocked elements until allnodes meet their timing requirements. In some embodiments, the processuses one of techniques described above to perform the retimingoperation.

Next, for each node other than a retimable clocked element, the processdetermines (at 1710) the user clock cycle in which the element will beexecuted during the operation of the IC. In some embodiments, theplacement engine determines the cycle for the node by performing thefollowing equation (F):

user cycle in which a node is executed=event time of the node \ clockperiod (F) where \ denotes an integer division and the event time of thenode is measured in absolute time. Next, the process determines (at1720) the particular sub-cycle in which the node will be executed bydetermining the relative time from the beginning the user cycle and thetime the node is executed. In some embodiments, the placement enginedetermines the relative time from the beginning of the user cycle byperforming the following equation (G):

execution time of the node relative to the beginning of the user clockcycle=the event time of the node Modulo clock period  (G)

In some embodiments equations (F) and (G) are part of the same integerdivision operation. The operation divides the event time of the node bythe clock period where the quotient is shown in equation (F) and theremainder is shown in equation (G).

FIGS. 18-21 illustrate an example of assignment of sub-cycles tocomputational elements of a netlist in some embodiments. The clockperiod is assumed to be eight time units. As shown, the event time ofnode 1805-1820 are assigned by the placement engine to be 38.2, 42.5,45, and 50.6 units of time. The event times are given in absolute valuesstarting from a time when the execution of the netlist will begin duringruntime.

FIG. 19 illustrates the user cycles to which each one of nodes 1805-1820are assigned after performing equation operations (F) and (G). Theexecution time of each node is normalized relative to the beginning ofthe user cycle. The placement engine assigns node 1805 to 6.4 time unitsfrom the start of user cycle four (integer division of 38.2 by 8 resultsin a quotient of 4 and a remainder of 6.2). The placement engine assignsnodes 1810 and 1815 to 2.5 and 5 time units from the start of user cyclefive (integer division of 42.5 and 45 by 8 results in a quotient of 5and remainders of 2.5 for 1810 and 5 for 1815). Finally, node 1820 isassigned to 2.6 time units from the start of user cycle six (integerdivision of 50.6 by 8 results in a quotient of 6 and a remainder of2.6).

FIG. 20 illustrates a time line that shows placements of nodes1805-1820. As shown, node 1805 is placed in user cycle four 2005, nodes1810-1815 are placed in user cycle five 2010, and node 1820 is placed inuser cycle six 2015.

FIG. 21 illustrates a time line that shows the assignment of nodes1805-1820 to sub-cycles for a reconfigurable IC in which the user clockhas four sub-cycles. The four sub-cycles of user cycle five aresub-cycles 2105-2120. This user cycle precedes by another cycle whichalso has four sub-cycles (only the last sub-cycle 2125 is shown). Alsoas shown, user cycle five is followed by another cycle that also hasfour sub-cycles (the first two sub-cycles 2130-2135 are shown). Asshown, node 1805 is assigned to the last sub-cycle 2125 of the usercycle four.

Similarly, node 1820 is assigned to the second sub-cycle 2135 of theuser cycle six, as shown in FIG. 21. The other two nodes 1810 and 1815are assigned to user cycle five. The relative times of these nodes fromthe start of user cycle five places node 1810 in sub-cycle two 2110 andnode 1815 in sub-cycle three 2115 of user cycle five, as shown.

V. Electronics System

FIG. 22 conceptually illustrates a more detailed example of anelectronics system 2200 that implements some of the above describedinventions. The system 2200 can be a stand-alone computing device, or itcan be part of another electronic device. As shown in FIG. 22, thesystem 2200 includes a processor 2205, a bus 2210, a system memory 2215,a non-volatile memory 2220, a storage device 2225, input devices 2230,output devices 2235, and communication interface 2240. In someembodiments, the non-volatile memory 2220 stores configuration data andre-loads it at power-up.

The bus 2210 collectively represents all system, peripheral, and chipsetinterconnects (including bus and non-bus interconnect structures) thatcommunicatively connect the numerous internal devices of the system2200. For instance, the bus 2210 communicatively connects the processor2205 with the non-volatile memory 2220, the system memory 2215, and thepermanent storage device 2225.

From these various memory units, the processor 2205 receives data forprocessing and retrieves from the various memory units, instructions toexecute. The non-volatile memory 2220 stores static data andinstructions that are needed by the processor 2205 and other modules ofthe system 2200. The storage device 2225 is read-and-write memorydevice. This device is a non-volatile memory unit that storesinstruction and/or data even when the system 2200 is off. Like thestorage device 2225, the system memory 2215 is a read-and-write memorydevice. However, unlike storage device 2225, the system memory is avolatile read-and-write memory, such as a random access memory. Thesystem memory stores some of the instructions and/or data that theprocessor 2205 needs at runtime.

The bus 2210 also connects to the input and output devices 2230 and2235. The input devices enable the user to enter information into thesystem 2200. The input devices 2230 can include touch-sensitive screens,keys, buttons, keyboards, cursor-controllers, microphone, etc. Theoutput devices 2235 display the output of the system 2200.

Finally, as shown in FIG. 22, bus 2210 also couples system 2200 to otherdevices through a communication interface 2240. Examples of thecommunication interface include network adapters that connect to anetwork of computers, or wired or wireless transceivers forcommunicating with other devices. One of ordinary skill in the art wouldappreciate that any other system configuration may also be used inconjunction with the invention, and these system configurations mighthave fewer or additional components.

Some embodiments include electronic components, such as microprocessors,storage, and memory that store computer program instructions (such asinstructions for performing operations of a placement engine) in amachine-readable or computer-readable medium. Examples ofmachine-readable media or computer-readable media include, but are notlimited to magnetic media such as hard disks, memory modules, magnetictape, optical media such as CD-ROMS and holographic devices,magneto-optical media such as optical disks, and hardware devices thatare specially configured to store and execute program code, such asapplication specific integrated circuits (ASICs), field-programmablegate arrays (FPGA), programmable logic devices (PLDs), ROM, and RAMdevices. Examples of computer programs or computer code include machinecode, such as produced by a compiler, and files including higher-levelcode that are executed by a computer, an electronic component, or amicroprocessor using an interpreter.

While the invention has been described with reference to numerousspecific details, one of ordinary skill in the art will recognize thatthe invention can be embodied in other specific forms without departingfrom the spirit of the invention. Thus, one of ordinary skill in the artwould understand that the invention is not to be limited by theforegoing illustrative details, but rather is to be defined by theappended claims.

1-19. (canceled)
 20. A method for designing an integrated circuit (IC),the method comprising: receiving a specification of a particular pathbetween a source node and a target node in the IC, the particular pathcomprises a set of computational elements and a set of clocked elements;attributing a positive combinational delay to each of the computationalelements and a negative combinational delay to each of the clockedelements, wherein the negative combinational delay attributed to eachclocked element is based on a clock period for operating the clockedelement; computing a sequential delay for the particular path by addingthe combinational delays attributed to the clocked elements and thecomputational elements; and determining whether the sequential delay ofthe particular path meets a timing requirement.
 21. The method of claim20 further comprising retiming the clocked elements between computationelements to make the delays between each two adjacent clocked elementless than or equal to the clock period.
 22. The method of claim 21,wherein the cumulative delay being less than one clock period indicatesthat there exists a retiming of the clocked elements such that allelements in the particular path meet their required timing.
 23. Themethod of claim 20, wherein each clocked element in the set of clockedelements is a retimable clocked element.
 24. The method of claim 20,wherein the IC comprises a plurality of reconfigurable circuits, eachreconfigurable circuit reconfigurable each cycle to implement acomputational element.
 25. The method of claim 20 further comprisingincreasing the clock period when the particular path fails to meet thetiming requirement.
 26. The method of claim 20 further comprisingdecreasing the clock period when the particular path successfully meetsthe timing requirement.
 27. The method of claim 20, wherein the set ofclocked elements in the particular path comprises a latch.
 28. Themethod of claim 20, wherein the specification of the particular path isprovided by a netlist.
 29. A method for designing an integrated circuit(IC), the method comprising: receiving a specification of a particularpath between a source node and a target node in the IC, the particularpath comprises a set of computational elements and a set of clockedelements, wherein each computational elements is attributed a positivetiming delay and each clocked element is attributed a negative delaythat is based on a clock period for operating the clocked element;computing a cumulative timing delay from the source node to the targetnode by adding the timing delays attributed to the set of computationalelements and the set of clocked elements; and determining whether theparticular path meets a timing requirement at the target node based onthe computed cumulative timing delay.
 30. The method of claim 29 furthercomprising retiming the clocked elements between computation elements tomake the delays between each two adjacent clocked element less than orequal to the clock period.
 31. The method of claim 30, wherein thecumulative delay being less than one clock period indicates that thereexists a retiming of the clocked elements such that all elements in theparticular path meet their required timing.
 32. The method of claim 29,wherein each clocked element in the set of clocked elements is aretimable clocked element.
 33. The method of claim 29, wherein the ICcomprises a plurality of reconfigurable circuits, each reconfigurablecircuit reconfigurable each cycle to implement a computational element.34. The method of claim 29 further comprising increasing the clockperiod when the particular path fails to meet the timing requirement.35. The method of claim 29 further comprising decreasing the clockperiod when the particular path successfully meets the timingrequirement.
 36. The method of claim 29, wherein the set of clockedelements in the particular path comprises a latch.
 37. The method ofclaim 29, wherein the specification of the particular path is providedby a netlist.
 38. A non-transitory computer readable medium storing aprogram for execution by one or more processing units, the programcomprising sets of instructions for: receiving a specification of aparticular path between a source node and a target node in the IC, theparticular path comprises a set of computational elements and a set ofclocked elements; attributing a positive combinational delay to each ofthe computational elements and a negative combinational delay to each ofthe clocked elements, wherein the negative combinational delayattributed to each clocked element is based on a clock period foroperating the clocked element; computing a sequential delay for theparticular path by adding the combinational delays attributed to theclocked elements and the computational elements; and determining whetherthe sequential delay of the particular path meets a timing requirement.39. The non-transitory computer readable medium of claim 39, wherein theprogram further comprising a set of instructions for retiming theclocked elements between computation elements to make the delays betweeneach two adjacent clocked element less than or equal to the clockperiod.